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Abstract 

Certification trails are a recently introduced and 
promising approach to fault-detection and fault- 
tolerance [11, 12]. Recent experimental work [13] 
reveals many cases in which a certification-trail ap- 
proach allows for significantly faster program execu- 
tion time than a basic time-redundancy approach. Al- 
gorithms for answer- validation of abstract data types 
are presented in [12] and allow a certification trail ap- 
proach to be used for a wide variety of problems. In 
this paper, we report on an attempt to assess the per- 
formance of algorithms utilizing certification trails on 
abstract data types. Specifically we have applied this 
method to the following problems: heapsort, Huffman 
tree, shortest path, and skyline. Previous results used 
certification trails specific to a particular problem and 
implementation. The approach in this paper allows 
certification trails to be localized to “data structure 
modules,” making the use of this technique transpar- 
ent to the user of such modules. 

Keywords: Software fault tolerance, certification 
trails, error monitoring, design diversity, data struc- 
tures. 


1 Introduction 

To explain the essence of the certification trail tech- 
nique for software fault tolerance, we first discuss 2- 
version programming [4, 2]. Using 2- version (or more 
generally, AT-version) programming, two (or N) im- 
plementations of an algorithm are executed on a given 
input, and the results compared. If the outputs agree, 
they are accepted, otherwise an error is flagged. This 
technique will detect a variety of software faults as well 
as transient hardware faults. A variation of this tech- 
nique is to execute a single program twice and compare 
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results, this is called time redundancy. Although there 
are a few software faults that may be detected using 
time redundancy (e.g., uninitialised pointer errors), it 
is more effective in catching transient faults. 

The certification trail technique is designed to 
achieve similar types of error detection capabilities but 
expend fewer resources. The centra] idea, is to modify 
the first algorithm so that it leaves behind a trail of 
data which we call a certification trail . The second 
algorithm may then make use of this data, which is 
chosen so that the algorithm executes more quickly 
and/or has a simpler structure than the first algo- 
rithm. As above, the outputs of the two executions 
are compared and are considered correct only if they 
agree. Note, however, we must be careful in defining 
this method or else its error detection capability might 
be reduced by the introduction of data dependency 
between the two algorithm executions. For example, 
suppose the first algorithm execution contains a er- 
ror which causes an incorrect output and an incorrect 
certification trail of data to be generated. Further sup- 
pose that no error occurs during the execution of the 
second algorithm. It appears possible that the execu- 
tion of the second algorithm might use the incorrect 
trail to generate an incorrect output which matches 
the incorrect output given by the execution of the first 
algorithm. Intuitively, the second execution would be 
“fooled” by the data left behind by the first execution. 
The definitions we give below exclude this possibility. 
They demand that the second execution either gener- 
ates a correct answer or signals the fact that an error 
has been detected in the data trail. 

Early work on the certification trail focused on cre- 
ating trails for specific implementation of problems. 
For example the trail given in [11] for the convex hull 
problem is specific to the Graham scan algorithm. In 
general, the two algorithms used in this approach can 
be quite different. A more recent approach is to con- 
struct a certification trail for an abstract data type. 
That is, given the answers to operations allowed on 
that type, our algorithm checks the correctness of 
these answers. This method has the advantage that 
the certification trail techniques are localised to the 
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outines implementing data structure operations, and 
nay then be applied to a wide variety of problems 
""Without special coding. In many cases it may be pos- 
sible to use existing code with only minor modifica- 
ions. Code using these routines is run twice, the first 
. ime generating the trail, the second time using it. Al- 
ternately, the trail checking may be done, in parallel, 
c \e M we perform the checking as the trail is being gen- 
1 % Crated. A programmer using a library of these routines 
~need not be familiar with certification trail techniques. 
_ Object oriented programming techniques may be par- 
_ icularly useful for implementation of such “certified” 
^-data types. 



Formal Definition of a Certification 
Trail 


i mi In this section we will give a formal definition of a 
certification trail and discuss some aspects of its real- 
_ nations and uses. 

^Definition 2.1 A problem P is formalised as a rela- 
tion, i.e., a set of ordered pairs. Let D be the domain 
(that is, the set of inputs) of the relation P and let 
^5 be the range (that is, the set of solutions) for the 
problem. We say an algorithm A solves a problem P 
iff for all d £ D when d is input to A then an § £ S is 
Lj>utput such that (d, s) £ P. 

Definition 2.2 Let P : D — ♦ S be a problem. A 

solution to this problem using a certification trail con- 

=sists of two functions F\ and Fj with the following do- 
mains and ranges F\ : D — ♦ S x T and Fj : D x T — ► 
S U {error}. T is the set of certification trails . The 
^.functions must satisfy the following two properties: 

(1) for all d £ D there exists < £ S and there 

exists t 6 T such that 

■— Fi(d) = (s, t) and Fj(d, <) = s and (d, s) € P 

(2) for all d £ D and for all < £ T 

either (Fj(d,<) = $ and (d, s) £ P) 

U 2 or Fa(d,f) = error. 


We also require that F\ and F* be implemented 
“so that they map elements which are not in their re- 
spective domains to the error symbol. The definitions 
above assure that the error detection capability of the 
^certification trail approach is comparable to that ob- 
tained with the simple time redundancy approach dis- 
cussed earlier. (That is, if transient hardware faults 
I occur during only one of the executions then either an 


error will be detected or the output will be correct.) 
It should be further noted, however, the examples to 
be considered will indicate that this new approach can 
also save overall execution time. 


3 Answer Validation Problem for Ab- 
stract Data Types 

Our general approach to applying certification 
trails uses the concept of an abstract data type. Some 
examples of abstract data types are given later in this 
paper. Here we mention some important common 
properties and give a short illustration. Each abstract 
data type has a well defined data object or set of data 
objects. Each abstract data type has a carefully de- 
fined finite collection of operations that can be per- 
formed on its data object(s). Each operation takes a 
finite number of arguments (possibly sero). In addi- 
tion, some but not all operations return answers. An 
example of an abstract data type is a priority queue. 
The data object for a priority queue is an ordered pair 
of the form (i,k) where i is an item number and k is 
a key value. A priority queue has two operations: in- 
sert^, k) and delmin. The insert operation has two 
arguments: item number i and key value k. The in- 
sert operation does not return an answer. The delmin 
operation has no arguments, but it does return an an- 
swer. The precise semantics of these operations are 
given later in this paper. 

For each abstract data type we may define an an- 
swer validation problem. Intuitively, the answer vali- 
dation problem consists of checking the correctness of 
a sequence of supposed answers to a sequence of op- 
erations performed on the abstract data type. More 
formally, the input to the answer validation problem 
is a sequence of operations on the abstract data type 
together with the arguments of each operation. In 
addition, the sequence contains the supposed answers 
for each of the operations which return answers. In 
particular, each supposed answer is paired with the 
operation that is supposed to return it. 

The output for the answer validation problem is the 
word “correct” if the answers given in the input match 
the answers that would be generated by actually per- 
forming the operations. The output is the word “in- 
correct” if the answers do not match. It is also useful 
to allow the output word to say “ill- formed” . This out- 
put is used if the sequence of operations is ill-formed, 
e.g., an operation has too many arguments or an ar- 
gument refers to an inappropriate object. 

The answer validation problem is similar to the idea 
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of an acceptance te*t which is used in the recovery 
block approach [10] to software fault tolerance. The 
main difference is that an answer validation problem 
is dependent upon a sequence of answers, not just an 
individual answer. Hence, if an incorrect answer ap- 
pears in the sequence, it may not be detected imme- 
diately. It is guaranteed, however, that an incorrect 
will be detected at some point during the processing 
of the entire sequence. By allowing for this latency in 
detection, it is possible to create a much more efficient 
procedure for solving the answer validation problem. 

The most important aspect of the answer validation 
problem is the fact that is is often possible to check the 
correctness of the answers to a sequence of operations 
much more quickly than actually calculating what the 
answers should be from scratch. In other words, the 
answer validation problem has a smaller time com- 
plexity than the original abstract data type problem. 
For example, to calculate the answers to a sequence 
of n priority queue operations takes fl(nlog(n)) time 
in the decision tree model; however, it is possible to 
check the correctness of the answers in only O(n) time 
[12]. This speed is very useful in fault-detection ap- 
plications. 

It is possible to run an answer validation algorithm 
for some abstract data type concurrently with some 
algorithm which uses the abstract data type. The an- 
swer validation algorithm could act as a monitor mak- 
ing sure that aU interactions with the abstract data 
type are handled correctly. This is valuable because 
many algorithms spend a large fraction of their time 
operating on abstract data types. Note, the overhead 
of this monitor is less than the overhead of actually 
performing the data type operations twice. 


4 Schema for using Certification Trails 

Suppose that we have developed an efficient solu- 
tion to the answer validation problem for some ab- 
stract data type. By efficient we mean the time com- 
plexity of the answer validation problem is smaller 
than the time complexity of the original abstract data 
type problem. Further, suppose that we wish to run 
an algorithm, say A, which uses that abstract data 
type. To apply the certification trail method we can 
use the following schema to yield the two executions: 

First Execution: 

Execute algorithm A. 

Each time an abstract data type operation is per- 
formed. Append to the certification trail the identity 
o the operation, the arguments and the answer. 


Second execution: 

Phase One: 

Validate the correctness of the operations and sup- 
posed answers given in the certification trail. If the 
validation returns “incorrect” or “ill-formed” then 
output “error" and stop. Otherwise, continue. 

Phase Two: 

Execute algorithm A. 

Each time an abstract data type operation is per- 
formed. Read the next entry in the certification trail. 
Make sure that the operation and the arguments in the 
certification trail agree with those requested in the al- 
gorithm. If not output “error" and stop. Otherwise, 
use the answer given in the certification trail and con- 
tinue. 

This schema can yield execution times which are 
significantly faster than the execution time obtained 
by running algorithm A twice. Yet the schemes yield 
comparable fault detection capabilities. Note, the first 
execution can be slower than a simple execution of al- 
gorithm A since it must output a certification trail. 
However, the second execution can be significantly 
than a simple execution of the algorithm since 
the interactions with the abstract data type take less 
time overall. The net effect can yield a major speed- 
up. 

Suppose an algorithm uses multiple abstract data 
types and suppose there are efficient answer validation 
algorithms for each of these abstract data types. It is 
easy to see how our method generalises. We can leave 
behind a generalised certification trail which consists 
of a seperate certification trail for each of the abstract 
data types. The effect on the speed up of the second 
execution will be cumulative. 


5 Generalized Priority Queue 

We now describe a somewhat general abstract data 
type. We are able to solve the answer validation prob- 
lem for restricted versions of this data type. The data 
consists of a set of ordered pairs. The first element in 
these ordered pairs is referred to as the item number 
and the second element is called the key value. Or- 
dered pairs may be added and removed from the set, 
however, at all times the item numbers of distinct or- 
dered pairs must be distinct. It is possible, though, 
for multiple ordered pairs to have the same key value. 
In this paper the item numbers are integers between 
1 and n, inclusive. Our default convention is that i is 
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»n item number, * it a key value and h is a »et of or- 
* deted pain. A total ordering on the pair* of a set can 
>— be defined lexicographically as follow*: (»,h) < (**■* ) 
iff k < V or (* = *' and i < «'). The abstract data 
type* we will consider support a subset of the following 
^ operations. 

member(t) returns a boolean value of true if the set 
»' n uin« an ordered pair with item number i, oth- 
erwise returns false. 

insert(i,fc) adds the ordered pair (*,h) to the set. We 
require that no other pair with item number i be 
in the set. 

delete (i) deletes the unique ordered pair with item 
number i from the set. We require that a pair 
_ with item number i be in the set initially. 

changekey(t, It) is executed only when there is an or- 
L dered pair with item number i in the set. This 
^ pair is replaced by («, It). 

deletemin returns the ordered pair which is smallest 
according to the total order defined above and 
■— deletes this pair. If the set is empty then the 
token “empty* is returned. 

L, min returns the ordered pair which is smallest accord- 
s'** iag to the total order defined above. If the set is 
empty then the token “empty" is returned. 

max and deletemax these operations are similar to 

— min and deletemin, using the largest element in- 
stead of the smallest one. 

— If an operation violates one of the requirements de- 
“ scribed above then it is considered to be ill-formed. 

Also, if an operation has the wrong number or type of 
arguments it is considered to be ill-formed. 

— Many different types and combinations of data 
structures can be used to support different subsets of 
these operations efficiently. Specifically we are inter- 

__ ested in allowing the insert, delete, min, and deletemin 
operations. It is possible to process a sequence of 0(n) 
operations in 0(n log(n)) with implementations using 
_ ; heaps or balanced search trees such as AVL trees [1], 
=" red-black trees [«] or b-trees [3]. Answer validation 
of these operations can be performed in 0(n) tune 

[12, U]- 

6 Examples of the use of Data Struc- 
_ ture Certification 

In *!>«« section we evaluate the use of certification 
trails for data structures as applied to four well-known 


and significant problems in computer science: sorting, 
the shortest path tree problem, the Huffman tree prob- 
lem, and the skyline problem. We have implemented 
basic algorithms for these problems and algorithms 
which generate and use certification trails. Tuning 
data was coDected using a SPARCstation ELC. 

The timing information reported in the tables con- 
sists of the run time of the basic algorithm (i.e., no 
certification trail), the run time of the trail-genetating 
algorithm, the run time of the trail-using algorithm, 
the percentage savings of using certification trails, and 
the speedup achieved by the second phase of the certi- 
fication trail method. The percentage savings is com- 
puted by comparing the total run time of algorithms 
for generating and using trails against twice the run 
time of the basic algorithm. The speedup is computed 
by dividing the run time of the basic algorithm by the 
run time of the algorithm that uses the certification 

trail. 

Apart from the data structures, the implementa- 
tion of both phases of the certification trail version of 
each algorithm is nearly identical to the implementa- 
tion of the basic version. The only difference in the 
code for the two phases is a parameter passed to the 
data structure code indicating whether a certification 
trail should be generated or used. All code implement- 
ing the certification trails is localised to the modules 
implementing the data structures, allowing the gener- 
ation and use of the trail to be transparent to the user 
of these modules. Due to space constraints only an 
abbreviated discussion of the algorithms is given. 


6.1 Heapsort 


Sorting is a fundamental operation in computer sys- 
tems, and there exist several sorting algorithms. Sort- 
ing may be implemented with a priority queue (or 
more specifically, a heap) by inserting all elements 
and performing deletemin operations until the queue 


; empty. , 

Input data was generated by creating set* of uxte- 
ers chosen uniformly from the interval [0, 10000000]. 
timing results are based on fifty executions at each 


6.2 Huffman Tree 

Given a sequence of frequencies (positive integers), 
we wish to construct a Huffman tree, i.e., a binary tree 
with frequencies assigned to the leaves, such that the 
sum of the weighted path lengths is minimised. This 
is a classic algorithmic problem and one of the ongmal 
solutions was found by Huffman [7]. It has been used 
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extensively in data compression algorithms through 
the design and use of so called Huffman codes. The 
tree structure and code design are based on frequencies 
of individual characters in the data to be compressed. 
In this paper we are concerned only with the Huff- 
man tree, the interested reader should consult [7] for 
information about the coding application. 

The Huffman tree is built from the bottom up and 
the overall structure of the algorithm is based on the 
greedy “merging” of subtrees. An array of pointers, 
ptr, is used to point to the subtrees as they are con- 
structed. Initially, n single vertex subtrees are con- 
structed, each one associated with a frequency num- 
ber in the input. The algorithm repeatedly merges the 
two subtrees with the smallest associated frequency 
values, assigning the sum of these frequencies to the 
resulting tree. A priority queue data structure allows 
the algorithm to quickly find the subtrees to merge at 
each step. 

Data for the timing experiments was generated by 
choosing integer frequencies uniformly from the range 
[0,100000]. Timing results are based on fifty execu- 
tions for each input sise. 

6.3 Shortest Path 

Given a graph with non-negative edge weights and 
a source vertex, we wish to find the shortest paths 
from the source vertex to each of the other vertices. 
This is another classic problem and has been examined 
extensively in the literature. Our approach is applied 
to D\jkstra’s algorithm. 

D\jkstra’s algorithm is a greedy algorithm. At each 
step, there exists a set of vertices S to which shortest 
paths are known, and a set T of vertices a<Uacent to 
members of this set. The best paths known to mem- 
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bers of T are examined, and the vertex v, with the 
minimum path length is removed from T and added to 
5. A data structure that supports insert, delete, and 
deletemin can be used to implement this algorithm. 

Input graphs of |Vj vertices and \E\ edges were gen- 
erated by choosing a set of | E\ distinct edges uniformly 
from all possible such sets, then rejecting graphs that 
were not connected. |£| was chosen sufficiently large 
that each selection is connected with high probability, 
resulting in few rejections. The input siies were cho- 
sen to keep the ration |J?|/|V r | constant, for in practice 
the running time of the algorithm is affected by this 
ratio. Timing results are based on fifty executions at 
each input sise. The sise column of Table 3 contains 
an ordered pair indicating the number of vertices and 
edges. 

6.4 Skyline 

Given a set of rectangles with with collinear bot- 
tom edges, the tkyline is the figure resulting from re- 
moving all hidden edges. The problem of computing 
the skyline of a set of rectangular buildings by elim- 
inating hidden lines is discussed in [8]. The method 
used is divide and conquer and it constructs a sky- 
line in 0(nlog(n)) time. In this paper we use a plane 
sweep algorithm that can be easily implemented in 
terms of operations on priority queues. Plane sweep 
algorithms are widely used for computational geom- 
etry problems [9], and typically use a priority queue 
for event scheduling, and may be amenable to use of 
certification trail techniques. 

Using a plane sweep algorithm, we compute the 
skyline as follows. Initialise a vertical sweep line to 
the left of all the rectangles (we may assume that all 
rectangle are to the right of the y-axis). As we sweep 
the line to the right we maintain a collection of the 
heights of the rectangles encountered. For each rect- 
angle R , the height of R is added to the collection 
when we encounter Jt’s left edge and removed when 
we encounter its right edge. The height of the skyline 
at any point * 0 » i* the maximum height in the collec- 
tion when the sweepline is at x — Details are 
below. A structure supporting insert and deletemin U 
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all that is needed to order the events, and a structure 
' fa Pporting insert, max, and delete is required to store 
the rectangle heights. A priority queue (supporting 
insert and can be used to order the sweepline events, 

— a generalised priqrity queue to store the rectangle 
heights. 

Input data was generated by choosing integral rect- 

— angle heights uniformly over the range [0, 100000]. 
The z-coordinates of the left edges were chosen uni- 

_ formly over the range [0, 90000] and the width of 
L_- eac ^ rectangle was chosen uniformly over the range 
If* 10000]. Timing results are based on twenty execu- 
tions for each input sise. 

«na 

y 

7 Conclusions 

The experimental data in this paper shows the util- 
t, ity of the certification trail approach using abstract 
|^data type*. This paper supplements [13] which pro- 
vides experimental data illustrating the advantages of 
implementation specific certification trails over classi- 
^cal time redundancy. We have shown that the more 
i_ general approach of checking abstract data types also 
provides performance superior to classical time redun- 
dancy. This is significant because a wide range of al- 
. gorithms may be represented as a sequence of oper- 
ations on abstract data types. The certification trail 
approach may therefore be used on these programs, 
without requiring per problem “ad hoc’’ techniques. 
•—Creation of library routines or class libraries for these 
data types allows the certification trail technique to be 
^used transparently, and may allow it’s use with only 
[ — minor modifications of existing code. 
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